Hee Saw Dhuh Kaet - a 1963 computer speech demonstration on 7" 33 RPM record.
COMPUTER-PRODUCED SPEECH-EXAMPLES OF SYNTHESIZED SPEECH — BY BELL TELEPHONE LABORATORIES FOR EDUCATIONAL USE.
COMPUTER SPEECH the examples of computer speech on this recording illustrate certain fundamental characteristics of both artificial and human speech, [This] is a product of Bell Telephone Laboratories research into the basic nature of speech and hearing.
This recording contains samples of synthesized speech—speech artificially constructed from the basic building blocks of the English language.
A machine which produces synthesized speech is called, fittingly, a talking machine. There are many possible kinds of speech synthesizers or talking machines. Instead of building and testing a variety of them, scientists at Bell Telephone Laboratories simulate their behaviour with a high-speed, general purpose computer.
The computer is instructed (programmed) to accept in sequence on punched cards the names of the speech sounds which make up an English sentence. It then processes this information, in accordance with the linguistic rules governing the English language, and produces an output analogous to the output of the talking machine it is programmed to simulate.
The talking machine simulated by the computer in this recording would normally be operated by continuously feeding it a set of nine control signals. The signals correspond to voice pitch, voice loudness, lip opening and other speech variables. When every instant of sound is specified, and every variable accounted for, such a machine produces human-sounding speech.
Setting up the computer to simulate this talking machine requires two sets of instructions or, more precisely, a two-part computer program. One part of the computer program performs the actual sound making function—it imitates the ‘‘talking’’ of a talking machine. The second part consists of rules for combining individual speech sounds into connected speech, and for producing the nine control signals that activate the talking machine.
Scientists at Bell Telephone Laboratories have developed a computer program that permits them to feed the names of speech sounds into the computer on punched cards. They also have devised a phonetic code using the letters of the alphabet. At present, it is made up of 22 consonant and 12 vowel sounds
CONSONANTS: P—B—T—D—K—G—M-N—NG (as in sing) —F—V—S—Z—SH (as in she)—ZH (as in azure)—H-W—R-L-Y_TH (as in this)—DH (as in then)
VOWELS: EE (as in bee)—I (as in ill)—AY (?)—E (as in end)—AE (as in add)—AH (as in ?)—? (as in jaw)—O (as in go)—OO (as in foot)—? (as in food)—UH (as in up)—ER (as in her)
Each speech sound is specified on a separate punched card. When a sequence of cards is fed into the computer, it ‘‘operates’’ on the information—following the rules set up in the second part of its program—to produce the nine control signals that activate the talking machine program.
For example, if the sequence of cards, H— EE —S — AW — DH — UH — K — AE —T, is fed into the computer, the machine will say, ‘‘He saw the cat,’’ in flat monotones. Proper inflection and phrasing are achieved by specifying on each card the changes in pitch and timing natural to human speech.
By specifying the pitch of the sounds, it also is possible to make the computer sing. In two of the samples recorded, the computer first sings a familiar tune and then, singing the same song, is accompanied by music played by another computer.
The ‘"speech" of the simulated talking machine comes out of the computer as tiny magnetized spots on half-inch magnetic tape. The tape is fed to another machine which converts the spots to a sound tape suitable for playing on an ordinary tape recorder.
The first eight and very last samples of synthesized speech on this recording are part of a research program aimed, principally, at formulating a minimum set of rules for making plausible English speech.
The ninth and tenth selections were produced by analyzing a person’s speech and re-constructing it synthetically on a computer. The objective of this program is to duplicate the sounds and transitions made by a human speaker, including his accent and dialect.
Knowledge developed through such research programs may be useful in devising new techniques for transmitting speech more efficiently over communications systems.
In the near future, for example, a person may be able to type on a keyboard and cause a typing machine thousands of miles away to speak for him. There is also the possibility that talking machines could be built for people who are unable to speak.